Uncertain Data Integration Using Functional Dependencies
نویسندگان
چکیده
Data integration systems are crucial for applications that need to provide a uniform interface to a set of autonomous and heterogeneous data sources. However, setting up a full data integration system for many application contexts, e.g. web and scientific data management, requires significant human effort which prevents it from being really scalable. In this paper, we propose IFD (Integration based on Functional Dependencies), a pay-as-you-go data integration system that allows integrating a given set of data sources, as well as incrementally integrating additional sources. IFD takes advantage of the background knowledge implied within functional dependencies for matching the source schemas. Our system is built on a probabilistic data model that allows capturing the uncertainty in data integration systems. Our performance evaluation results show significant performance gains of our approach in terms of recall and precision compared to the baseline approaches. They confirm the importance of functional dependencies and also the contribution of using a probabilistic data model in improving the quality of schema matching. The analytical study and experiments show that IFD scales well.
منابع مشابه
An Uncertain Data Integration System
Data integration systems offer uniform access to a set of autonomous and heterogeneous data sources. An important task in setting up a data integration system is to match the attributes of the source schemas. In this paper, we propose a data integration system which uses the knowledge implied within functional dependencies for matching the source schemas. We build our system on a probabilistic ...
متن کاملProbabilistic XML functional dependencies based on possible world model
With the increase of uncertain data in many new applications, such as sensor network, data integration, web extraction, etc., uncertainty both in relational databases and XML datasets has attracted more and more research interests in recent years. As functional dependencies (FDs) are critical and necessary to schema design and data rectification in relational databases and XML datasets, it is a...
متن کاملMining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows
Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...
متن کاملIntegration of Web Sources Under Uncertainty and Dependencies Using Probabilistic XML
We explore in this paper the problem of integrating several web data sources under uncertainty and dependencies. We present a real application of this from web sources about objects in the maritime domain where uncertainties and dependencies are ubiquitous. Uncertainties are mainly caused by imprecise data trackers and imperfect human knowledge whereas dependencies come from the frequent copyin...
متن کاملPay-As-You-Go Data Integration Using Functional Dependencies
Setting up a full data integration system for many application contexts, e.g. web and scientific data management, requires significant human effort which prevents it from being really scalable. In this paper, we propose IFD (Integration based on Functional Dependencies), a pay-as-you-go data integration system that allows integrating a given set of data sources, as well as incrementally integra...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017